-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Composable SFT #28
base: master
Are you sure you want to change the base?
Composable SFT #28
Conversation
Issue #23 is addressed by this. EDITED: |
My calc for # tunable params is slightly off still. I'm missing something minor. (In file, I estimate # params for the model using pfeiffer+inv with Yong, did you run calcs to find # of tunable adapter params, or just get that number by adding an adapter to the model? |
@yongzx This is ready to merge, the only thing needing changing is the calc for # of parameters to finetune. |
I can help do this, no worries! It's just a running sum of trainable parameters. I will review the code over the weekend. |
I just did a quick read of the commit. It seems like for SFT, we don't need to modify anything in the adapter-transformers? |
Yep! Just
to install their code. Thanks! |
Also
what I meant by this was to set the number of parameters this method changes (it's fully configurable) such that that total was equivalent to using |
I will prioritize evaluation test suites over this for now, but I hope to finish reviewing this before our meeting this Friday. |
No problem! For reference,
Now to install the dependency, do this instead. I'm hoping to add a |
Refering to MLM training scripts, the training steps for both full-model and sparse finetuning seem to be equal. Since we are comparing sparse-finetuning to other adapters methods, we need to set both to be 25K steps. |
558f674 now supports composable SFT. Hailey, do you want to test it out? |
Yes, let me try running this with those parameters! |
https://arxiv.org/pdf/2110.07560.pdf <-- Paper
https://github.com/cambridgeltl/composable-sft <-- code
TODOs:
If we want to train both adapters and Composable SFT at once, this will require some extra code. Probably not TOO bad, but would need extra testing to account for freezing all correct parameters